Amit Shetty
Describe the objective of this assignment. You can briefly state how you accompilsh it.
Understanding how neural networks work when it delas with training ther data is the primary goal of this assignment. To understand this, I will be focussing on implementing and tetsing 3 techniques, k fold validation where I will be using 5 folds to test and train the model. Tnis will be followed by Non Linear Regression and a non linear logisitc regression. In order to achieve this, I will be trying for non linearity when I classify and fit the data. This leads us to us using higher order polynomials i.e. polynomial regression. Since performing polynomial regression can be computationally expensive, neural networks seem to the obvious choice to go for since the task of computing higher order function will be split among the different layers of neurons. Once the task of computation is done by the different layers of neurons the final outputs will be the weights that will halp correctly obtain the data. To check how effcient neural networks can be we will be using different activation functions like relu, tanh and sigmoid. Thge primary concern here is that by not controlling the number of layers to apply we will be train the model to such a high accuracy that it will lead to overfitting.
Introduce your data and visualize them. Describe your observations about the data. You can reuse the data that you examined in Assignment #1 (of course for regression).
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones).
While the datasets are divied into two different wine types, the characters for calulcating the quality of wine still remains the same for both wines
Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol
Target variable (based on sensory data): 12 - quality (score between 0 and 10)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df_red = pd.read_csv("winequality_red.csv")
df_white = pd.read_csv("winequality_white.csv")
We will be adding a new column to our red and white wine datasets tom add a new color column to be able to differentiate between the wines when the datasets are merged
df_red["color"] = "R"
df_white["color"] = "W"
Merging the two red and white wine datasets together for eventual training and testing
df_all=pd.concat([df_red,df_white],axis=0)
df_all.head()
To avoid any issues withspaces when processing data, we will be renaming the data columns and replacing the spaces with _ characters
df_white.rename(columns={'fixed acidity': 'fixed_acidity','citric acid':'citric_acid','volatile acidity':'volatile_acidity','residual sugar':'residual_sugar','free sulfur dioxide':'free_sulfur_dioxide','total sulfur dioxide':'total_sulfur_dioxide'}, inplace=True)
df_red.rename(columns={'fixed acidity': 'fixed_acidity','citric acid':'citric_acid','volatile acidity':'volatile_acidity','residual sugar':'residual_sugar','free sulfur dioxide':'free_sulfur_dioxide','total sulfur dioxide':'total_sulfur_dioxide'}, inplace=True)
df_all.rename(columns={'fixed acidity': 'fixed_acidity','citric acid':'citric_acid','volatile acidity':'volatile_acidity','residual sugar':'residual_sugar','free sulfur dioxide':'free_sulfur_dioxide','total sulfur dioxide':'total_sulfur_dioxide'}, inplace=True)
df_all.head()
Dummy variables for modelling. other variables will be normalised when the models are created
df = pd.get_dummies(df_all, columns=["color"])
df
# Checking for nay null values
df_all.isnull().sum()
df_all.describe()
Plotting correlation matrices for both red and white wines individually and for the whole combination
plt.subplots(figsize=(20,15))
ax = plt.axes()
ax.set_title("Wine Characteristic Correlation Heatmap (Reds)")
corr = df_red.corr()
sns.heatmap(corr,
xticklabels=corr.columns.values,
yticklabels=corr.columns.values, annot=True, cmap="Reds")
plt.show()
plt.subplots(figsize=(20,15))
ax = plt.axes()
ax.set_title("Wine Characteristic Correlation Heatmap (Reds)")
corr = df_white.corr()
sns.heatmap(corr,
xticklabels=corr.columns.values,
yticklabels=corr.columns.values, annot=True, cmap = "Blues")
plt.show()
plt.subplots(figsize=(20,15))
ax = plt.axes()
ax.set_title("Wine Characteristic Correlation Heatmap (All wines)")
corr = df_all.corr()
sns.heatmap(corr,
xticklabels=corr.columns.values,
yticklabels=corr.columns.values, annot=True, cmap="Oranges")
plt.show()
Testing association between density of wine and sugar content of red, white and combined wine dataset
scat1 = sns.regplot(x = "density", y = "residual_sugar", fit_reg = True, color='r', data = df_red)
plt.xlabel("Density of wine")
plt.ylabel("Residual sugar in wine in grams")
plt.title("Association between RED wine's density and residual sugar")
plt.show()
scat1 = sns.regplot(x = "density", y = "residual_sugar", fit_reg = True, color='b', data = df_white)
plt.xlabel("Density of wine")
plt.ylabel("Residual sugar in wine in grams")
plt.title("Association between WHITE wine's density and residual sugar")
plt.show()
We will now be checking how the quality of wine is distributed for red and white wines and combined dataset
df_red["quality"] = pd.Categorical(df_red["quality"])
sns.countplot(x="quality", data=df_red)
plt.xlabel("Quality level of RED wine (0-10 scale)")
plt.show()
df_white["quality"] = pd.Categorical(df_white["quality"])
sns.countplot(x="quality", data=df_white)
plt.xlabel("Quality level of WHITE wine (0-10 scale)")
plt.show()
One of the key factors affecting the quality is the amount of alchohol in the wine
sns.factorplot(x="quality", y="alcohol", data=df_red, kind="strip")
plt.xlabel("Quality level of wine, 0-10 scale")
plt.ylabel("Alcohol level in wine in % ABV")
plt.title("Alcohol percent in each level of RED wine's quality")
plt.show()
sns.factorplot(x="quality", y="alcohol", data=df_white, kind="strip")
plt.xlabel("Quality level of wine, 0-10 scale")
plt.ylabel("Alcohol level in wine in % ABV")
plt.title("Alcohol percent in each level of WHITE wine's quality")
plt.show()
To summarise the data collection, plotting the correlation between all the different variables with the final quality
sns.pairplot(df_red, vars=df_red.columns[:-1])
plt.title("Pair plot showing RED wine charcteristics")
plt.show()
sns.pairplot(df_white, vars=df_white.columns[:-1])
plt.title("Pair plot showing WHITE wine charcteristics")
plt.show()
sns.pairplot(df_all, vars=df_all.columns[:-1])
plt.title("Pair plot showing ALL wine charcteristics")
plt.show()
Given data set, said dataset is split into a K number of sections/folds where each fold is used as a testing set at some point. Data can be split into multiple folds depending on the size and the amount of test and validation that need to be performed. For this eercise, I will be focussing on 5 fold cross validation where data is folded into 5 separate folds/ections. In the first iteration, the first fold is used to test the model and the rest are used to train the model. In the second iteration, 2nd fold is used as the testing set while the rest serve as the training set. This process is repeated until each fold of the 5 folds have been used as the testing set.
K fold cross validation was created sicne it was etremely difficult to check how accurate. particular machine learning model is. The usual practice is to split the data into 2 or 3 parts i.e. training or testing sets (or in some cases validation sets). We then evaluate the model performance based on an error metric to determine the accuracy of the model. This method however, is not very reliable as the accuracy obtained for one test set can be very different to the accuracy obtained for a different test set. K-fold Cross Validation(CV) provides a solution to this problem by dividing the data into folds and ensuring that each fold is used as a testing set at some point.
Normally, KFold crosss validation is implemented using the scikit learn library. In this eercise, I will be implementing it using python.
The following approach will be used when implementing this approach
from sklearn.model_selection import train_test_split
from sklearn.model_selection import ParameterGrid
def crossValidation(folds, X1):
X = X1[:, :-2]
T = X1[:, [11,12]]
XTrain, XTest, TTrain, TTest = train_test_split(X, T, test_size=0.2)
number = int(XTrain.shape[0] / folds)
param_grid = [{'optim': ['scg'], '_lambda': [.001, .01, .1, 1]} ] # Making different combination of Hyperparameters
grid = list(ParameterGrid(param_grid)) # Generating the list all possible combinations of hyperparameters
hiddenUnits = [5, 10, 15, 20, 25]
arr_cv = []
arr_test = []
final_param = []
test_index = []
bestUnits = []
for i in range(folds):
print("Running fold {}".format(i+1))
lower = number*i
upper = number*(i+1)
X_test, T_test = XTrain[lower:upper, :], TTrain[lower:upper, :] # Creating 1st fold as our test data
XTrain_rem, TTrain_rem = np.delete(XTrain, np.s_[lower:upper], axis= 0), np.delete(TTrain, np.s_[lower:upper], axis= 0) #Deleting the test part and assogning the remaining part
grid_index = []
arr_cv = []
hunits = []
for j in range(folds-1): # Iterating over next four folds and splitting for validation and training data
low = number*j
high = number*(j+1)
X_validate, T_validate = XTrain_rem[low:high, :], TTrain_rem[low:high, :]
XTrain_cv, TTrain_cv = np.delete(XTrain_rem, np.s_[low:high], axis= 0), np.delete(TTrain_rem, np.s_[low:high], axis= 0)
for k in range(len(grid)): # Iteratinng over all the combnations of hyperparameters
for units in range(len(hiddenUnits)):
nn_cv = NeuralNet([XTrain.shape[1], hiddenUnits[units], TTrain.shape[1]])
nn_cv.train(XTrain_cv, TTrain_cv, **grid[k])
ypred_cv = nn_cv.use(X_validate)
arr_cv.append(np.sqrt(np.mean((T_validate - ypred_cv)**2)))
grid_index.append(k)
hunits.append(hiddenUnits[units])
minimum = arr_cv.index(min(arr_cv)) # Getting the index of minimum errors from array
min_index = grid_index[minimum] # Storing that index of best hyperparam.
best_hunit = hunits[minimum]
print(" Best Hyperparameters: ", grid[min_index])
print(" Best number of hidden units: ", best_hunit)
nn_cv = NeuralNet([XTrain.shape[1], best_hunit, TTrain.shape[1]])
nn_cv.train(XTrain_rem, TTrain_rem, **grid[min_index])
ypred_test = nn_cv.use(X_test)
arr_test.append(np.sqrt(np.mean((T_test - ypred_test)**2))) # Calculating test error from best parameters from validation sets
test_index.append(min_index) # Storing the corresponding index from grid_index with rmse
bestUnits.append(best_hunit)
print(" RMSE value for this test set: ",np.sqrt(np.mean((T_test - ypred_test)**2)))
index= arr_test.index(min(arr_test)) # Storing the index of best hyperparameters after running onto all folds test cases.
final_param = grid[test_index[index]]
print("Final Hyperparameters are: {}".format(final_param))
final_units = bestUnits[index]
print("Final hidden units are: {}".format(final_units))
nn_cv = NeuralNet([XTrain.shape[1], final_units, TTrain.shape[1]])
nn_cv.train(XTrain, TTrain, **final_param)
ypred = nn_cv.use(XTest)
print("Final RMSE after 5-fold cross validation: ".format(np.sqrt(np.mean((TTest - ypred)**2))))
Nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. In the past, advanced modelers would work with nonlinear functions, including exponential functions, logarithmic functions, trigonometric functions, power functions, Gaussian function, and Lorenz curves. Some of these functions, such as the exponential or logarithmic functions, would then be transformed so that they would be linear. When so transformed, standard linear regression would be performed, but the classical approach has significant problems, especially if the modeler is working with larger datasets and/or if the data includes missing values, nonlinear relationships, local patterns and interactions.
In nonlinear regression, a statistical model of the form, $y = f(z, \beta)$
relates a vector of independent variables, x, and its associated observed dependent variables, y. The function f is nonlinear in the components of the vector of parameters β, but otherwise arbitrary. For example, the Michaelis–Menten model for enzyme kinetics has two parameters and one independent variable, related by f by:
$ f(z, \beta) = \frac{\beta1.z}{\beta2\,+\, z}\\ $
This function is nonlinear because it cannot be expressed as a linear combination of the two β's.
The leanring rate is meant to be customisable so that once the trianing starts we can check if we are overfitting the data or not. The right leanring rate and the number of epochs can help determine if A perfect learning rate is neither too high nor too low. It has the right amount of activation so as to give an output which fits the data perfectly. Each iteration is a batch optimization, where each time a full forward and backword prop takes place on the entire data, this having a high value can cause overfitting, and a low value can cause underfitting. A perfect balance between the above hyperparameters is essential.Non Linear Regression is all about controlling hyperparameters, changing the hidden layers hyperparameters, and this process can achieve the right amount of non linearity.
The following apporach will be used for solving the problem
""" Neural Network
referenced NN code by Chuck Anderson in R and C++
by Jake Lee (lemin)
example usage:
X = numpy.array([0,0,1,0,0,1,1,1]).reshape(4,2)
T = numpy.array([0,1,1,0,1,0,0,1]).reshape(4,2)
nn = nnet.NeuralNet([2,3,2])
nn.train(X,T, wprecision=1e-20, fprecision=1e-2)
Y = nn.use(X)
"""
from grad import scg, steepest
from copy import copy
class NeuralNet:
def __init__(self, nunits):
self._nLayers=len(nunits)-1
self.rho = [1] * self._nLayers
self._W = []
wdims = []
lenweights = 0
for i in range(self._nLayers):
nwr = nunits[i] + 1
nwc = nunits[i+1]
wdims.append((nwr, nwc))
lenweights = lenweights + nwr * nwc
self._weights = np.random.uniform(-0.1,0.1, lenweights)
start = 0 # fixed index error 20110107
for i in range(self._nLayers):
end = start + wdims[i][0] * wdims[i][1]
self._W.append(self._weights[start:end])
self._W[i].resize(wdims[i])
start = end
self.stdX = None
self.stdT = None
self.stdTarget = True
def add_ones(self, w):
return np.hstack((np.ones((w.shape[0], 1)), w))
def get_nlayers(self):
return self._nLayers
def set_hunit(self, w):
for i in range(self._nLayers-1):
if w[i].shape != self._W[i].shape:
print("set_hunit: shapes do not match!")
break
else:
self._W[i][:] = w[i][:]
def pack(self, w):
return np.hstack(map(np.ravel, w))
def unpack(self, weights):
self._weights[:] = weights[:] # unpack
def cp_weight(self):
return copy(self._weights)
def RBF(self, X, m=None,s=None):
if m is None: m = np.mean(X)
if s is None: s = 2 #np.std(X)
r = 1. / (np.sqrt(2*np.pi)* s)
return r * np.exp(-(X - m) ** 2 / (2 * s ** 2))
def forward(self,X):
t = X
Z = []
for i in range(self._nLayers):
Z.append(t)
if i == self._nLayers - 1:
t = np.dot(self.add_ones(t), self._W[i])
else:
t = np.tanh(np.dot(self.add_ones(t), self._W[i]))
#t = self.RBF(np.dot(np.hstack((np.ones((t.shape[0],1)),t)),self._W[i]))
return (t, Z)
def backward(self, error, Z, T, lmb=0):
delta = error
N = T.size
dws = []
for i in range(self._nLayers - 1, -1, -1):
rh = float(self.rho[i]) / N
if i==0:
lmbterm = 0
else:
lmbterm = lmb * np.vstack((np.zeros((1, self._W[i].shape[1])),
self._W[i][1:,]))
dws.insert(0,(-rh * np.dot(self.add_ones(Z[i]).T, delta) + lmbterm))
if i != 0:
delta = np.dot(delta, self._W[i][1:, :].T) * (1 - Z[i]**2)
return self.pack(dws)
def _errorf(self, T, Y):
return T - Y
def _objectf(self, T, Y, wpenalty):
return 0.5 * np.mean(np.square(T - Y)) + wpenalty
def train(self, X, T, **params):
verbose = params.pop('verbose', False)
# training parameters
_lambda = params.pop('Lambda', 0.)
#parameters for scg
niter = params.pop('niter', 1000)
wprecision = params.pop('wprecision', 1e-10)
fprecision = params.pop('fprecision', 1e-10)
wtracep = params.pop('wtracep', False)
ftracep = params.pop('ftracep', False)
# optimization
optim = params.pop('optim', 'scg')
if self.stdX == None:
explore = params.pop('explore', False)
self.stdX = Standardizer(X, explore)
Xs = self.stdX.standardize(X)
if self.stdT == None and self.stdTarget:
self.stdT = Standardizer(T)
T = self.stdT.standardize(T)
def gradientf(weights):
self.unpack(weights)
Y,Z = self.forward(Xs)
error = self._errorf(T, Y)
return self.backward(error, Z, T, _lambda)
def optimtargetf(weights):
""" optimization target function : MSE
"""
self.unpack(weights)
#self._weights[:] = weights[:] # unpack
Y,_ = self.forward(Xs)
Wnb=np.array([])
for i in range(self._nLayers):
if len(Wnb)==0: Wnb=self._W[i][1:,].reshape(self._W[i].size-self._W[i][0,].size,1)
else: Wnb = np.vstack((Wnb,self._W[i][1:,].reshape(self._W[i].size-self._W[i][0,].size,1)))
wpenalty = _lambda * np.dot(Wnb.flat ,Wnb.flat)
return self._objectf(T, Y, wpenalty)
if optim == 'scg':
result = scg(self.cp_weight(), gradientf, optimtargetf,
wPrecision=wprecision, fPrecision=fprecision,
nIterations=niter,
wtracep=wtracep, ftracep=ftracep,
verbose=False)
self.unpack(result['w'][:])
self.f = result['f']
elif optim == 'steepest':
result = steepest(self.cp_weight(), gradientf, optimtargetf,
nIterations=niter,
xPrecision=wprecision, fPrecision=fprecision,
xtracep=wtracep, ftracep=ftracep )
self.unpack(result['w'][:])
if ftracep:
self.ftrace = result['ftrace']
if 'reason' in result.keys() and verbose:
print(result['reason'])
return result
def use(self, X, retZ=False):
if self.stdX:
Xs = self.stdX.standardize(X)
else:
Xs = X
Y, Z = self.forward(Xs)
if self.stdT is not None:
Y = self.stdT.unstandardize(Y)
if retZ:
return Y, Z
return Y
Logistic regression is a special type of regression in which the goal is to model the probability of something as a function of other variables. Consider a set of predictor vectors where is the number of observations and is a column vector containing the values of the predictors for the th observation. The response variable for is where represents a Binomial random variable with parameters , the number of trials, and , the probability of success for trial . The normalized response variable is - the proportion of successes in trials for observation . While logistic regression makes core assumptions about the observations such as IID (each observation is independent of the others and they all have an identical probability distribution), the use of a linear decision boundary is not one of them. The linear decision boundary is used for reasons of simplicity. In those cases where we suspect the decision boundary to be nonlinear, it may make sense to formulate logistic regression with a nonlinear model and evaluate how much better we can do. Like in non linear regression we will have the chocie of using multiple activation functions like tanh, sigmoid, softmax etc. Due to the size of the data and the classificationn nature of the data we will be using softmax function as the desired choice. The function that will be implemented as follows $$ g_k(x) = P(T=k \mid x) = \frac{e^{\kappa_k}}{\sum_{c=1}^K e^{\kappa_c}} $$
Non linear logisitc regression uses the following function
$$ V \leftarrow V + \alpha_h Xl^\top \Big( (T - g(X)) W^\top \odot (1 - Z^2) \Big). $$where $\alpha_h$ and $\alpha_o$ are the learning rate for hidden and output layer respectively. Here, we denote the output of the neural network as $\kappa$.
The following apporach is used when solving this problem
Logistic regression has traditionally been used to come up with a hyperplane that separates the feature space into classes. But if we suspect that the decision boundary is nonlinear we may get better results by attempting some nonlinear functional forms for the logit function.
from util import Standardizer
import numpy as np
import matplotlib.pyplot as plt
import copy
from grad import scg, steepest
class NeuralNetLogReg:
def __init__(self, nunits):
self._nLayers=len(nunits)-1
self.rho = [1] * self._nLayers
self._W = []
wdims = []
lenweights = 0
for i in range(self._nLayers):
nwr = nunits[i] + 1
nwc = nunits[i+1]
wdims.append((nwr, nwc))
lenweights = lenweights + nwr * nwc
self._weights = np.random.uniform(-0.1,0.1, lenweights)
start = 0 # fixed index error 20110107
for i in range(self._nLayers):
end = start + wdims[i][0] * wdims[i][1]
self._W.append(self._weights[start:end])
self._W[i].resize(wdims[i])
start = end
self.stdX = None
self.stdT = None
self.stdTarget = True
def add_ones(self, w):
return np.hstack((np.ones((w.shape[0], 1)), w))
def get_nlayers(self):
return self._nLayers
def set_hunit(self, w):
for i in range(self._nLayers-1):
if w[i].shape != self._W[i].shape:
print("set_hunit: shapes do not match!")
break
else:
self._W[i][:] = w[i][:]
def pack(self, w):
return np.hstack(map(np.ravel, w))
def unpack(self, weights):
self._weights[:] = weights[:] # unpack
def cp_weight(self):
return copy.copy(self._weights)
def RBF(self, X, m=None,s=None):
if m is None: m = np.mean(X)
if s is None: s = 2 #np.std(X)
r = 1. / (np.sqrt(2*np.pi)* s)
return r * np.exp(-(X - m) ** 2 / (2 * s ** 2))
def forward(self,X):
t = X
Z = []
for i in range(self._nLayers):
Z.append(t)
if i == self._nLayers - 1:
t = 1/(1+np.exp(-np.dot(self.add_ones(t), self._W[i])))
#print(t)
else:
t = np.tanh(np.dot(self.add_ones(t), self._W[i]))
return (t, Z)
def backward(self, error, Z, T, lmb=0):
delta = error
N = T.size
dws = []
for i in range(self._nLayers - 1, -1, -1):
rh = float(self.rho[i]) / N
if i==0:
lmbterm = 0
else:
lmbterm = lmb * np.vstack((np.zeros((1, self._W[i].shape[1])),
self._W[i][1:,]))
dws.insert(0,(-rh * np.dot(self.add_ones(Z[i]).T, delta) + lmbterm))
if i != 0:
delta = np.dot(delta, self._W[i][1:, :].T) * (1 - Z[i]**2)
return self.pack(dws)
def _errorf(self, T, Y):
return T - Y
def _objectf(self, T, Y, wpenalty):
return -(np.sum( np.sum((T * np.log(Y)) , axis=1), axis=0)) + wpenalty
def train(self, X, T,**params):
verbose = params.pop('verbose', False)
# training parameters
_lambda = params.pop('Lambda', 0.05)
#parameters for scg
niter = params.pop('niter', 1000)
wprecision = params.pop('wprecision', 1e-10)
fprecision = params.pop('fprecision', 1e-10)
wtracep = params.pop('wtracep', False)
ftracep = params.pop('ftracep', False)
# optimization
optim = params.pop('optim', 'scg')
if self.stdX == None:
explore = params.pop('explore', False)
self.stdX = Standardizer(X, explore)
Xs = self.stdX.standardize(X)
if self.stdT == None and self.stdTarget and False:
self.stdT = Standardizer(T)
T = self.stdT.standardize(T)
def gradientf(weights):
self.unpack(weights)
Y,Z = self.forward(Xs)
error = self._errorf(T, Y)
return self.backward(error, Z, T, _lambda)
def optimtargetf(weights):
""" optimization target function : MSE
"""
self.unpack(weights)
Y,_ = self.forward(Xs)
Wnb=np.array([])
for i in range(self._nLayers):
if len(Wnb)==0: Wnb=self._W[i][1:,].reshape(self._W[i].size-self._W[i][0,].size,1)
else: Wnb = np.vstack((Wnb,self._W[i][1:,].reshape(self._W[i].size-self._W[i][0,].size,1)))
wpenalty = _lambda * np.dot(Wnb.flat ,Wnb.flat)
return self._objectf(T, Y, wpenalty)
if optim == 'scg':
result = scg(self.cp_weight(), gradientf, optimtargetf,
wPrecision=wprecision, fPrecision=fprecision,
nIterations=niter,
wtracep=wtracep, ftracep=ftracep,
verbose=False)
self.unpack(result['w'][:])
self.f = result['f']
elif optim == 'steepest':
result = steepest(self.cp_weight(), gradientf, optimtargetf,
nIterations=niter,
xPrecision=wprecision, fPrecision=fprecision,
xtracep=wtracep, ftracep=ftracep )
self.unpack(result['w'][:])
if ftracep:
self.ftrace = result['ftrace']
if 'reason' in result.keys() and verbose:
print(result['reason'])
return result
def use(self, X, retZ=False):
if self.stdX:
Xs = self.stdX.standardize(X)
else:
Xs = X
Y, Z = self.forward(Xs)
if self.stdT is not None:
Y = self.stdT.unstandardize(Y)
if retZ:
return Y, Z
return Y
Calculating correlation matrix for each of the qualities describing how they impact the final quality
df_all.corr()
Testing cross validation on Non Linear Logistic Regression
categorical_cleanup = {"color":{"R":1, "W":0}}
Cleaning up the variables by dummy encoding them for the last color column which will act as our classifier
df_all.replace(categorical_cleanup, inplace=True)
df_all.head()
from util import Standardizer
import numpy as np
import matplotlib.pyplot as plt
import copy
from grad import scg, steepest
class NeuralNetLogReg:
def __init__(self, nunits):
self._nLayers=len(nunits)-1
self.rho = [1] * self._nLayers
self._W = []
wdims = []
lenweights = 0
for i in range(self._nLayers):
nwr = nunits[i] + 1
nwc = nunits[i+1]
wdims.append((nwr, nwc))
lenweights = lenweights + nwr * nwc
self._weights = np.random.uniform(-0.1,0.1, lenweights)
start = 0 # fixed index error 20110107
for i in range(self._nLayers):
end = start + wdims[i][0] * wdims[i][1]
self._W.append(self._weights[start:end])
self._W[i].resize(wdims[i])
start = end
self.stdX = None
self.stdT = None
self.stdTarget = True
def add_ones(self, w):
return np.hstack((np.ones((w.shape[0], 1)), w))
def get_nlayers(self):
return self._nLayers
def set_hunit(self, w):
for i in range(self._nLayers-1):
if w[i].shape != self._W[i].shape:
print("set_hunit: shapes do not match!")
break
else:
self._W[i][:] = w[i][:]
def pack(self, w):
return np.hstack(map(np.ravel, w))
def unpack(self, weights):
self._weights[:] = weights[:] # unpack
def cp_weight(self):
return copy.copy(self._weights)
def RBF(self, X, m=None,s=None):
if m is None: m = np.mean(X)
if s is None: s = 2 #np.std(X)
r = 1. / (np.sqrt(2*np.pi)* s)
return r * np.exp(-(X - m) ** 2 / (2 * s ** 2))
def forward(self,X):
t = X
Z = []
for i in range(self._nLayers):
Z.append(t)
if i == self._nLayers - 1:
t = 1/(1+np.exp(-np.dot(self.add_ones(t), self._W[i])))
#print(t)
else:
t = np.tanh(np.dot(self.add_ones(t), self._W[i]))
return (t, Z)
def backward(self, error, Z, T, lmb=0):
delta = error
N = T.size
dws = []
for i in range(self._nLayers - 1, -1, -1):
rh = float(self.rho[i]) / N
if i==0:
lmbterm = 0
else:
lmbterm = lmb * np.vstack((np.zeros((1, self._W[i].shape[1])),
self._W[i][1:,]))
dws.insert(0,(-rh * np.dot(self.add_ones(Z[i]).T, delta) + lmbterm))
if i != 0:
delta = np.dot(delta, self._W[i][1:, :].T) * (1 - Z[i]**2)
return self.pack(dws)
def _errorf(self, T, Y):
return T - Y
def _objectf(self, T, Y, wpenalty):
return -(np.sum( np.sum((T * np.log(Y)) , axis=1), axis=0)) + wpenalty
def train(self, X, T,**params):
verbose = params.pop('verbose', False)
# training parameters
_lambda = params.pop('Lambda', 0.05)
#parameters for scg
niter = params.pop('niter', 1000)
wprecision = params.pop('wprecision', 1e-10)
fprecision = params.pop('fprecision', 1e-10)
wtracep = params.pop('wtracep', False)
ftracep = params.pop('ftracep', False)
# optimization
optim = params.pop('optim', 'scg')
if self.stdX == None:
explore = params.pop('explore', False)
self.stdX = Standardizer(X, explore)
Xs = self.stdX.standardize(X)
if self.stdT == None and self.stdTarget and False:
self.stdT = Standardizer(T)
T = self.stdT.standardize(T)
def gradientf(weights):
self.unpack(weights)
Y,Z = self.forward(Xs)
error = self._errorf(T, Y)
return self.backward(error, Z, T, _lambda)
def optimtargetf(weights):
""" optimization target function : MSE
"""
self.unpack(weights)
Y,_ = self.forward(Xs)
Wnb=np.array([])
for i in range(self._nLayers):
if len(Wnb)==0: Wnb=self._W[i][1:,].reshape(self._W[i].size-self._W[i][0,].size,1)
else: Wnb = np.vstack((Wnb,self._W[i][1:,].reshape(self._W[i].size-self._W[i][0,].size,1)))
wpenalty = _lambda * np.dot(Wnb.flat ,Wnb.flat)
return self._objectf(T, Y, wpenalty)
if optim == 'scg':
result = scg(self.cp_weight(), gradientf, optimtargetf,
wPrecision=wprecision, fPrecision=fprecision,
nIterations=niter,
wtracep=wtracep, ftracep=ftracep,
verbose=False)
self.unpack(result['w'][:])
self.f = result['f']
elif optim == 'steepest':
result = steepest(self.cp_weight(), gradientf, optimtargetf,
nIterations=niter,
xPrecision=wprecision, fPrecision=fprecision,
xtracep=wtracep, ftracep=ftracep )
self.unpack(result['w'][:])
if ftracep:
self.ftrace = result['ftrace']
if 'reason' in result.keys() and verbose:
print(result['reason'])
return result
def use(self, X, retZ=False):
if self.stdX:
Xs = self.stdX.standardize(X)
else:
Xs = X
Y, Z = self.forward(Xs)
if self.stdT is not None:
Y = self.stdT.unstandardize(Y)
if retZ:
return Y, Z
return Y
from sklearn.model_selection import train_test_split
from sklearn.model_selection import ParameterGrid
def crossValidationLogReg(folds, X):
X1 = X[:, :-2]
T = X[:, 12:]
XTrain, XTest, TTrain, TTest = train_test_split(X, T, test_size=0.2)
number = int(XTrain.shape[0] / folds)
param_grid = [{'optim': ['scg'], '_lambda': [.001, .01, .1, 1]} ]
grid = list(ParameterGrid(param_grid))
hiddenUnits = [5, 10, XTrain.shape[1], 15, 20]
arr_cv = []
arr_test = []
final_param = []
test_index = []
bestUnits = []
for i in range(folds):
print("Running fold {}".format(i+1))
lower = number*i
upper = number*(i+1)
X_test, T_test = XTrain[lower:upper, :], TTrain[lower:upper, :] # Creating 1st fold as our test data
XTrain_rem, TTrain_rem = np.delete(XTrain, np.s_[lower:upper], axis= 0), np.delete(TTrain, np.s_[lower:upper], axis= 0) #Deleting the test part and assogning the remaining part
grid_index = []
arr_cv = []
hunits = []
for j in range(folds-1):
low = number*j
high = number*(j+1)
X_validate, T_validate = XTrain_rem[low:high, :], TTrain_rem[low:high, :]
XTrain_cv, TTrain_cv = np.delete(XTrain_rem, np.s_[low:high], axis= 0), np.delete(TTrain_rem, np.s_[low:high], axis= 0)
for k in range(len(grid)):
for units in range(len(hiddenUnits)):
nn_cv = NeuralNetLogReg([XTrain.shape[1], hiddenUnits[units], TTrain.shape[1]])
nn_cv.train(XTrain_cv, TTrain_cv, **grid[k])
ypred_cv = nn_cv.use(X_validate)
arr_cv.append(np.sqrt(np.mean((T_validate - ypred_cv)**2)))
grid_index.append(k)
hunits.append(hiddenUnits[units])
minimum = arr_cv.index(min(arr_cv)) # Getting the index of minimum errors from array
min_index = grid_index[minimum] # Storing that index of best hyperparam.
best_hunit = hunits[minimum]
print(" Best Hyperparameters: ", grid[min_index])
print(" Best number of hidden units: ", best_hunit)
#for index in range(len(grid_index)): # Iterating over the indices of grid indexes to calculate test errors
nn_cv = NeuralNetLogReg([XTrain.shape[1], best_hunit, TTrain.shape[1]])
nn_cv.train(XTrain_rem, TTrain_rem, **grid[min_index])
ypred_test = nn_cv.use(X_test)
arr_test.append(np.sqrt(np.mean((T_test - ypred_test)**2))) # Calculating test error from best parameters from validation sets
test_index.append(min_index) # Storing the corresponding index from grid_index with rmse
bestUnits.append(best_hunit)
print(" RMSE value for this test set: ",np.sqrt(np.mean((T_test - ypred_test)**2)))
index= arr_test.index(min(arr_test)) # Storing the index of best hyperparameters after running onto all folds test cases.
final_param = grid[test_index[index]]
print("Final Hyperparameters are: ",final_param)
final_units = bestUnits[index]
print("Final hidden units are: ",final_units)
nn_cv = NeuralNetLogReg([XTrain.shape[1], final_units, TTrain.shape[1]])
nn_cv.train(XTrain, TTrain, **final_param)
ypred = nn_cv.use(XTest)
print("Final RMSE after 5-fold cross validation: ",np.sqrt(np.mean((TTest - ypred)**2)))
plt.plot(TTest[:100])
plt.plot(ypred[:100])
plt.show()
crossValidationLogReg(5, df_all.values)
Testing cross validation on non linear regression
from util import Standardizer
import numpy as np
import matplotlib.pyplot as plt
import copy
from grad import scg, steepest
class NeuralNet:
def __init__(self, nunits):
self._nLayers=len(nunits)-1
self.rho = [1] * self._nLayers
self._W = []
wdims = []
lenweights = 0
for i in range(self._nLayers):
nwr = nunits[i] + 1
nwc = nunits[i+1]
wdims.append((nwr, nwc))
lenweights = lenweights + nwr * nwc
self._weights = np.random.uniform(-0.1,0.1, lenweights)
start = 0 # fixed index error 20110107
for i in range(self._nLayers):
end = start + wdims[i][0] * wdims[i][1]
self._W.append(self._weights[start:end])
self._W[i].resize(wdims[i])
start = end
self.stdX = None
self.stdT = None
self.stdTarget = True
def add_ones(self, w):
return np.hstack((np.ones((w.shape[0], 1)), w))
def get_nlayers(self):
return self._nLayers
def set_hunit(self, w):
for i in range(self._nLayers-1):
if w[i].shape != self._W[i].shape:
print("set_hunit: shapes do not match!")
break
else:
self._W[i][:] = w[i][:]
def pack(self, w):
return np.hstack(map(np.ravel, w))
def unpack(self, weights):
self._weights[:] = weights[:] # unpack
def cp_weight(self):
return copy.copy(self._weights)
def RBF(self, X, m=None,s=None):
if m is None: m = np.mean(X)
if s is None: s = 2 #np.std(X)
r = 1. / (np.sqrt(2*np.pi)* s)
return r * np.exp(-(X - m) ** 2 / (2 * s ** 2))
def forward(self,X):
t = X
Z = []
for i in range(self._nLayers):
Z.append(t)
if i == self._nLayers - 1:
t = np.dot(self.add_ones(t), self._W[i])
else:
t = np.tanh(np.dot(self.add_ones(t), self._W[i]))
#t = self.RBF(np.dot(np.hstack((np.ones((t.shape[0],1)),t)),self._W[i]))
return (t, Z)
def backward(self, error, Z, T, lmb=0):
delta = error
N = T.size
dws = []
for i in range(self._nLayers - 1, -1, -1):
rh = float(self.rho[i]) / N
if i==0:
lmbterm = 0
else:
lmbterm = lmb * np.vstack((np.zeros((1, self._W[i].shape[1])),
self._W[i][1:,]))
dws.insert(0,(-rh * np.dot(self.add_ones(Z[i]).T, delta) + lmbterm))
if i != 0:
delta = np.dot(delta, self._W[i][1:, :].T) * (1 - Z[i]**2)
return self.pack(dws)
def _errorf(self, T, Y):
return T - Y
def _objectf(self, T, Y, wpenalty):
return 0.5 * np.mean(np.square(T - Y)) + wpenalty
def train(self, X, T, **params):
verbose = params.pop('verbose', False)
# training parameters
_lambda = params.pop('Lambda', 0.)
#parameters for scg
niter = params.pop('niter', 1000)
wprecision = params.pop('wprecision', 1e-10)
fprecision = params.pop('fprecision', 1e-10)
wtracep = params.pop('wtracep', False)
ftracep = params.pop('ftracep', False)
# optimization
optim = params.pop('optim', 'scg')
if self.stdX == None:
explore = params.pop('explore', False)
self.stdX = Standardizer(X, explore)
Xs = self.stdX.standardize(X)
if self.stdT == None and self.stdTarget:
self.stdT = Standardizer(T)
T = self.stdT.standardize(T)
def gradientf(weights):
self.unpack(weights)
Y,Z = self.forward(Xs)
error = self._errorf(T, Y)
return self.backward(error, Z, T, _lambda)
def optimtargetf(weights):
""" optimization target function : MSE
"""
self.unpack(weights)
#self._weights[:] = weights[:] # unpack
Y,_ = self.forward(Xs)
Wnb=np.array([])
for i in range(self._nLayers):
if len(Wnb)==0: Wnb=self._W[i][1:,].reshape(self._W[i].size-self._W[i][0,].size,1)
else: Wnb = np.vstack((Wnb,self._W[i][1:,].reshape(self._W[i].size-self._W[i][0,].size,1)))
wpenalty = _lambda * np.dot(Wnb.flat ,Wnb.flat)
return self._objectf(T, Y, wpenalty)
if optim == 'scg':
result = scg(self.cp_weight(), gradientf, optimtargetf,
wPrecision=wprecision, fPrecision=fprecision,
nIterations=niter,
wtracep=wtracep, ftracep=ftracep,
verbose=False)
self.unpack(result['w'][:])
self.f = result['f']
elif optim == 'steepest':
result = steepest(self.cp_weight(), gradientf, optimtargetf,
nIterations=niter,
xPrecision=wprecision, fPrecision=fprecision,
xtracep=wtracep, ftracep=ftracep )
self.unpack(result['w'][:])
if ftracep:
self.ftrace = result['ftrace']
if 'reason' in result.keys() and verbose:
print(result['reason'])
return result
def use(self, X, retZ=False):
if self.stdX:
Xs = self.stdX.standardize(X)
else:
Xs = X
Y, Z = self.forward(Xs)
if self.stdT is not None:
Y = self.stdT.unstandardize(Y)
if retZ:
return Y, Z
return Y
from sklearn.model_selection import train_test_split
from sklearn.model_selection import ParameterGrid
def crossValidation(folds, X1):
X = X1[:, :-2]
T = X1[:, [11,12]]
XTrain, XTest, TTrain, TTest = train_test_split(X, T, test_size=0.2)
number = int(XTrain.shape[0] / folds)
param_grid = [{'optim': ['scg'], '_lambda': [.001, .01, .1, 1]} ] # Making different combination of Hyperparameters
grid = list(ParameterGrid(param_grid)) # Generating the list all possible combinations of hyperparameters
hiddenUnits = [5, 10, XTrain.shape[1], 15, 20]
arr_cv = []
arr_test = []
final_param = []
test_index = []
bestUnits = []
for i in range(folds):
print("Running fold {}".format(i+1))
lower = number*i
upper = number*(i+1)
X_test, T_test = XTrain[lower:upper, :], TTrain[lower:upper, :] # Creating 1st fold as our test data
XTrain_rem, TTrain_rem = np.delete(XTrain, np.s_[lower:upper], axis= 0), np.delete(TTrain, np.s_[lower:upper], axis= 0) #Deleting the test part and assogning the remaining part
grid_index = []
arr_cv = []
hunits = []
for j in range(folds-1): # Iterating over next four folds and splitting for validation and training data
low = number*j
high = number*(j+1)
X_validate, T_validate = XTrain_rem[low:high, :], TTrain_rem[low:high, :]
XTrain_cv, TTrain_cv = np.delete(XTrain_rem, np.s_[low:high], axis= 0), np.delete(TTrain_rem, np.s_[low:high], axis= 0)
for k in range(len(grid)): # Iteratinng over all the combnations of hyperparameters
for units in range(len(hiddenUnits)):
nn_cv = NeuralNet([XTrain.shape[1], hiddenUnits[units], TTrain.shape[1]])
nn_cv.train(XTrain_cv, TTrain_cv, **grid[k])
ypred_cv = nn_cv.use(X_validate)
arr_cv.append(np.sqrt(np.mean((T_validate - ypred_cv)**2)))
grid_index.append(k)
hunits.append(hiddenUnits[units])
minimum = arr_cv.index(min(arr_cv)) # Getting the index of minimum errors from array
min_index = grid_index[minimum] # Storing that index of best hyperparam.
best_hunit = hunits[minimum]
print(" Best Hyperparameters: ", grid[min_index])
print(" Best number of hidden units: ", best_hunit)
nn_cv = NeuralNet([XTrain.shape[1], best_hunit, TTrain.shape[1]])
nn_cv.train(XTrain_rem, TTrain_rem, **grid[min_index])
ypred_test = nn_cv.use(X_test)
arr_test.append(np.sqrt(np.mean((T_test - ypred_test)**2))) # Calculating test error from best parameters from validation sets
test_index.append(min_index) # Storing the corresponding index from grid_index with rmse
bestUnits.append(best_hunit)
print(" RMSE value for this test set: ",np.sqrt(np.mean((T_test - ypred_test)**2)))
index= arr_test.index(min(arr_test)) # Storing the index of best hyperparameters after running onto all folds test cases.
final_param = grid[test_index[index]]
print("Final Hyperparameters are: {}".format(final_param))
final_units = bestUnits[index]
print("Final hidden units are: {}".format(final_units))
nn_cv = NeuralNet([XTrain.shape[1], final_units, TTrain.shape[1]])
nn_cv.train(XTrain, TTrain, **final_param)
ypred = nn_cv.use(XTest)
print("Final RMSE after 5-fold cross validation: ".format(np.sqrt(np.mean((TTest - ypred)**2))))
plt.plot(TTest[:100])
plt.plot(ypred[:100])
plt.show()
crossValidation(5, df_all.values)
""" Neural Network
referenced NN code by Chuck Anderson in R and C++
by Jake Lee (lemin)
example usage:
X = numpy.array([0,0,1,0,0,1,1,1]).reshape(4,2)
T = numpy.array([0,1,1,0,1,0,0,1]).reshape(4,2)
nn = nnet.NeuralNet([2,3,2])
nn.train(X,T, wprecision=1e-20, fprecision=1e-2)
Y = nn.use(X)
"""
from grad import scg, steepest
from copy import copy
class NeuralNet:
def __init__(self, nunits):
self._nLayers=len(nunits)-1
self.rho = [1] * self._nLayers
self._W = []
wdims = []
lenweights = 0
for i in range(self._nLayers):
nwr = nunits[i] + 1
nwc = nunits[i+1]
wdims.append((nwr, nwc))
lenweights = lenweights + nwr * nwc
self._weights = np.random.uniform(-0.1,0.1, lenweights)
start = 0 # fixed index error 20110107
for i in range(self._nLayers):
end = start + wdims[i][0] * wdims[i][1]
self._W.append(self._weights[start:end])
self._W[i].resize(wdims[i])
start = end
self.stdX = None
self.stdT = None
self.stdTarget = True
def add_ones(self, w):
return np.hstack((np.ones((w.shape[0], 1)), w))
def get_nlayers(self):
return self._nLayers
def set_hunit(self, w):
for i in range(self._nLayers-1):
if w[i].shape != self._W[i].shape:
print("set_hunit: shapes do not match!")
break
else:
self._W[i][:] = w[i][:]
def pack(self, w):
return np.hstack(map(np.ravel, w))
def unpack(self, weights):
self._weights[:] = weights[:] # unpack
def cp_weight(self):
return copy(self._weights)
def RBF(self, X, m=None,s=None):
if m is None: m = np.mean(X)
if s is None: s = 2 #np.std(X)
r = 1. / (np.sqrt(2*np.pi)* s)
return r * np.exp(-(X - m) ** 2 / (2 * s ** 2))
def forward(self,X):
t = X
Z = []
for i in range(self._nLayers):
Z.append(t)
if i == self._nLayers - 1:
t = np.dot(self.add_ones(t), self._W[i])
else:
t = np.tanh(np.dot(self.add_ones(t), self._W[i]))
#t = self.RBF(np.dot(np.hstack((np.ones((t.shape[0],1)),t)),self._W[i]))
return (t, Z)
def backward(self, error, Z, T, lmb=0):
delta = error
N = T.size
dws = []
for i in range(self._nLayers - 1, -1, -1):
rh = float(self.rho[i]) / N
if i==0:
lmbterm = 0
else:
lmbterm = lmb * np.vstack((np.zeros((1, self._W[i].shape[1])),
self._W[i][1:,]))
dws.insert(0,(-rh * np.dot(self.add_ones(Z[i]).T, delta) + lmbterm))
if i != 0:
delta = np.dot(delta, self._W[i][1:, :].T) * (1 - Z[i]**2)
return self.pack(dws)
def _errorf(self, T, Y):
return T - Y
def _objectf(self, T, Y, wpenalty):
return 0.5 * np.mean(np.square(T - Y)) + wpenalty
def train(self, X, T, **params):
verbose = params.pop('verbose', False)
# training parameters
_lambda = params.pop('Lambda', 0.)
#parameters for scg
niter = params.pop('niter', 1000)
wprecision = params.pop('wprecision', 1e-10)
fprecision = params.pop('fprecision', 1e-10)
wtracep = params.pop('wtracep', False)
ftracep = params.pop('ftracep', False)
# optimization
optim = params.pop('optim', 'scg')
if self.stdX == None:
explore = params.pop('explore', False)
self.stdX = Standardizer(X, explore)
Xs = self.stdX.standardize(X)
if self.stdT == None and self.stdTarget:
self.stdT = Standardizer(T)
T = self.stdT.standardize(T)
def gradientf(weights):
self.unpack(weights)
Y,Z = self.forward(Xs)
error = self._errorf(T, Y)
return self.backward(error, Z, T, _lambda)
def optimtargetf(weights):
""" optimization target function : MSE
"""
self.unpack(weights)
#self._weights[:] = weights[:] # unpack
Y,_ = self.forward(Xs)
Wnb=np.array([])
for i in range(self._nLayers):
if len(Wnb)==0: Wnb=self._W[i][1:,].reshape(self._W[i].size-self._W[i][0,].size,1)
else: Wnb = np.vstack((Wnb,self._W[i][1:,].reshape(self._W[i].size-self._W[i][0,].size,1)))
wpenalty = _lambda * np.dot(Wnb.flat ,Wnb.flat)
return self._objectf(T, Y, wpenalty)
if optim == 'scg':
result = scg(self.cp_weight(), gradientf, optimtargetf,
wPrecision=wprecision, fPrecision=fprecision,
nIterations=niter,
wtracep=wtracep, ftracep=ftracep,
verbose=False)
self.unpack(result['w'][:])
self.f = result['f']
elif optim == 'steepest':
result = steepest(self.cp_weight(), gradientf, optimtargetf,
nIterations=niter,
xPrecision=wprecision, fPrecision=fprecision,
xtracep=wtracep, ftracep=ftracep )
self.unpack(result['w'][:])
if ftracep:
self.ftrace = result['ftrace']
if 'reason' in result.keys() and verbose:
print(result['reason'])
return result
def use(self, X, retZ=False):
if self.stdX:
Xs = self.stdX.standardize(X)
else:
Xs = X
Y, Z = self.forward(Xs)
if self.stdT is not None:
Y = self.stdT.unstandardize(Y)
if retZ:
return Y, Z
return Y
Testing Non Linear Regression Code on a sample dataset
X_dash = np.array([0,0,1,0,0,1,1,1]).reshape(4,2)
T_dash = np.array([0,1,1,0,1,0,0,1]).reshape(4,2)
nn = NeuralNet([2,3,2])
nn.train(X_dash,T_dash, wprecision=1e-20, fprecision=1e-2)
Y = nn.use(X_dash)
plt.plot(Y)
plt.show()
Deciding on the values that need to be input and the target variables for the neural network to train
X = df_all.iloc[:, :-2].values
Our target variable in this case is the quality of the wine which will be our regression variable
T = df_all.iloc[:, 11:12].values
print(X.shape, T.shape)
X_train, X_test, T_train, T_test = train_test_split(X, T, test_size=0.2)
print(X_train.shape, X_test.shape, T_train.shape, T_test.shape)
nn = NeuralNet([X_train.shape[1], 5, T_train.shape[1]])
nn.train(X_train, T_train)
T_pred = nn.use(X_test)
print(T_pred.shape)
plt.plot(T_test[:100])
plt.plot(T_pred[:100])
print("Accuracy: ", 100 - np.mean(np.abs(T_test - T_pred)) * 100, "%")
print(" RMSE: ",np.sqrt(np.mean((T_test - T_pred)**2)))
from util import Standardizer
import numpy as np
import matplotlib.pyplot as plt
import copy
from grad import scg, steepest
class NeuralNetLogReg:
def __init__(self, nunits):
self._nLayers=len(nunits)-1
self.rho = [1] * self._nLayers
self._W = []
wdims = []
lenweights = 0
for i in range(self._nLayers):
nwr = nunits[i] + 1
nwc = nunits[i+1]
wdims.append((nwr, nwc))
lenweights = lenweights + nwr * nwc
self._weights = np.random.uniform(-0.1,0.1, lenweights)
start = 0 # fixed index error 20110107
for i in range(self._nLayers):
end = start + wdims[i][0] * wdims[i][1]
self._W.append(self._weights[start:end])
self._W[i].resize(wdims[i])
start = end
self.stdX = None
self.stdT = None
self.stdTarget = True
def add_ones(self, w):
return np.hstack((np.ones((w.shape[0], 1)), w))
def get_nlayers(self):
return self._nLayers
def set_hunit(self, w):
for i in range(self._nLayers-1):
if w[i].shape != self._W[i].shape:
print("set_hunit: shapes do not match!")
break
else:
self._W[i][:] = w[i][:]
def pack(self, w):
return np.hstack(map(np.ravel, w))
def unpack(self, weights):
self._weights[:] = weights[:] # unpack
def cp_weight(self):
return copy.copy(self._weights)
def RBF(self, X, m=None,s=None):
if m is None: m = np.mean(X)
if s is None: s = 2 #np.std(X)
r = 1. / (np.sqrt(2*np.pi)* s)
return r * np.exp(-(X - m) ** 2 / (2 * s ** 2))
def forward(self,X):
t = X
Z = []
for i in range(self._nLayers):
Z.append(t)
if i == self._nLayers - 1:
t = 1/(1+np.exp(-np.dot(self.add_ones(t), self._W[i])))
#print(t)
else:
t = np.tanh(np.dot(self.add_ones(t), self._W[i]))
return (t, Z)
def backward(self, error, Z, T, lmb=0):
delta = error
N = T.size
dws = []
for i in range(self._nLayers - 1, -1, -1):
rh = float(self.rho[i]) / N
if i==0:
lmbterm = 0
else:
lmbterm = lmb * np.vstack((np.zeros((1, self._W[i].shape[1])),
self._W[i][1:,]))
dws.insert(0,(-rh * np.dot(self.add_ones(Z[i]).T, delta) + lmbterm))
if i != 0:
delta = np.dot(delta, self._W[i][1:, :].T) * (1 - Z[i]**2)
return self.pack(dws)
def _errorf(self, T, Y):
return T - Y
def _objectf(self, T, Y, wpenalty):
return -(np.sum( np.sum((T * np.log(Y)) , axis=1), axis=0)) + wpenalty
def train(self, X, T,**params):
verbose = params.pop('verbose', False)
# training parameters
_lambda = params.pop('Lambda', 0.05)
#parameters for scg
niter = params.pop('niter', 1000)
wprecision = params.pop('wprecision', 1e-10)
fprecision = params.pop('fprecision', 1e-10)
wtracep = params.pop('wtracep', False)
ftracep = params.pop('ftracep', False)
# optimization
optim = params.pop('optim', 'scg')
if self.stdX == None:
explore = params.pop('explore', False)
self.stdX = Standardizer(X, explore)
Xs = self.stdX.standardize(X)
if self.stdT == None and self.stdTarget and False:
self.stdT = Standardizer(T)
T = self.stdT.standardize(T)
def gradientf(weights):
self.unpack(weights)
Y,Z = self.forward(Xs)
error = self._errorf(T, Y)
return self.backward(error, Z, T, _lambda)
def optimtargetf(weights):
""" optimization target function : MSE
"""
self.unpack(weights)
Y,_ = self.forward(Xs)
Wnb=np.array([])
for i in range(self._nLayers):
if len(Wnb)==0: Wnb=self._W[i][1:,].reshape(self._W[i].size-self._W[i][0,].size,1)
else: Wnb = np.vstack((Wnb,self._W[i][1:,].reshape(self._W[i].size-self._W[i][0,].size,1)))
wpenalty = _lambda * np.dot(Wnb.flat ,Wnb.flat)
return self._objectf(T, Y, wpenalty)
if optim == 'scg':
result = scg(self.cp_weight(), gradientf, optimtargetf,
wPrecision=wprecision, fPrecision=fprecision,
nIterations=niter,
wtracep=wtracep, ftracep=ftracep,
verbose=False)
self.unpack(result['w'][:])
self.f = result['f']
elif optim == 'steepest':
result = steepest(self.cp_weight(), gradientf, optimtargetf,
nIterations=niter,
xPrecision=wprecision, fPrecision=fprecision,
xtracep=wtracep, ftracep=ftracep )
self.unpack(result['w'][:])
if ftracep:
self.ftrace = result['ftrace']
if 'reason' in result.keys() and verbose:
print(result['reason'])
return result
def use(self, X, retZ=False):
if self.stdX:
Xs = self.stdX.standardize(X)
else:
Xs = X
Y, Z = self.forward(Xs)
if self.stdT is not None:
Y = self.stdT.unstandardize(Y)
if retZ:
return Y, Z
return Y
X_dashlog = np.array([0,0,1,0,0,1,1,1]).reshape(4,2)
T_dashlog = np.array([0,1,1,0,1,0,0,1]).reshape(4,2)
nn = NeuralNetLogReg([2,3,2])
nn.train(X_dashlog,T_dashlog, wprecision=1e-20, fprecision=1e-2)
Y_dashlog = nn.use(X_dashlog)
plt.plot(Y)
df_all
categorical_cleanup = {"color":{"R":1, "W":0}}
# df_all.replace(categorical_cleanup, inplace=True)
df_all.head()
X = df_all.iloc[:, :-2].values
Our target variable this time will be the type of wine which will show if it si white wine or red wine
T = df_all.iloc[:, 12:].values
print(X.shape, T.shape)
print(np.unique(T))
X_train, X_test, T_train, T_test = train_test_split(X, T, test_size=0.2)
X_train.shape
X_test.shape
T_train.shape
T_test.shape
nn = NeuralNetLogReg([X_train.shape[1],10,T_train.shape[1]])
nn.train(X_train,T_train)
Y = nn.use(X_test)
print(Y)
plt.plot(T_test[:100])
plt.plot(Y[:100])
plt.show()
print("Accuracy: ", 100 - np.mean(np.abs(T_test - Y)) * 100, "%")
print(" RMSE: ",np.sqrt(np.mean((T_test - Y)**2)))
Based on the above graphs we are no closer to getting an accurate regresison or classification algorthms directly from the applied methodd. The high accuracy needed for the dataset is not achieved when using this apporach. This can be due to a number of reasons like the attributes not providing enough weights to warrant the final target varibale. It is to be noted that the quality of wine is usually subjective i.e. to say that the determining factor that tell how good or bad one wine is from another usually depends on the user. Another reason could be the combinaltion of hidden layers and the neurons used which even by slight tweaking can have an impact on the final accuracy and RMSE.
Discuss the challenges or somethat that you learned. If you have any suggestion about the assignment, you can write about it.
This challenge has by far been the most challenging. The understanding of how non linear regression and no linear logisitc regression works have been the focus of this assignment. The fact that regression in itself can be represented by a non linear function displays the different use cases possible for tracking progression fo the target variable.
Non Linear Logistic Regression was implemented using the same neural network class by modifying the NeuralNet class and its train method. The formula applied here will be different than tat of non linear regression. So far no linear logistic regression has given a better score than non linear regression.
One of other concepts discussed in this assignment has been the k fold concept when the user will not use just one training and testing sets but several which are interchangeable. The concept here was tested with both Non Linear Regression and Non Linear Logistic Regression. One of the other benefits of using this technique is that the model can be tested using different hyperparameters. By doing so we can fine tune the mdoel to get better accuracy.
Now you are testing various activation functions in this section. Use the best neural network structure and explore 3 different activation functions of your choice (one should be tanh that you used in the previous sections). You should use cross validation to discover the best model (with activation function).
One extra credit is assigned when you finish the work completely.
DO NOT forget to submit your data! Your notebook is supposed to run fine after running your codes.
Note: this is a WRITING assignment. Proper writing is REQUIRED. Comments are not considered as writing.
| points | description | |
|---|---|---|
| 5 | Overview | states the objective and the appraoch |
| 10 | Data | |
| 2 | Includes description of your data | |
| 3 | Plots to visualize data | |
| 5 | Reading and analyzing the plots | |
| 40 | Methods | |
| 10 | Summary of CV & correctness of implementation | |
| 5 | Summary of nonlinear regression | |
| 5 | Explanation of codes | |
| 5 | Summary of nonlinear logistic regression | |
| 5 | Explanation of codes | |
| 10 | Examination of correct implementation (NonlinearLogReg) with toy data. | |
| 40 | Results | Your Data |
| 10 | Presentaion of CV results | |
| 10 | Discussions about parameter/network structure choice | |
| 10 | plots for results | |
| 10 | Discussion about the prediction results. Try to analyze what nonlinear regression model learned. | |
| 5 | Conclusions |